In [1]:
%matplotlib inline
In [2]:
import numpy as np
import pandas as pd
In [3]:
pwd
Out[3]:
In [4]:
cd output
In [5]:
ls
This file contains the ranked predictions of the test set.
In [6]:
ranking_frame = pd.read_csv('rankings_20170425.csv')
In [7]:
ranking_frame.columns
Out[7]:
The probabilities are in descending order. Observe the greater number of True values at the top of the rankings versus the bottom.
In [8]:
ranking_frame.rrover.head(20)
Out[8]:
In [9]:
ranking_frame.rrover.tail(20)
Out[9]:
Let's plot the True/False ratios for each probability decile. These ratios should roughly reflect the trend in the calibration plot.
In [10]:
ranking_frame['bins'] = pd.qcut(ranking_frame.probability, 10, labels=False)
In [11]:
grouped = ranking_frame.groupby('bins')
In [12]:
def get_ratio(series):
ratio = series.value_counts()[1] / series.size
return ratio
In [13]:
grouped['rrover'].apply(get_ratio).plot(kind='bar')
Out[13]:
In [ ]: